20 research outputs found
Robust Subgraph Generation Improves Abstract Meaning Representation Parsing
The Abstract Meaning Representation (AMR) is a representation for open-domain
rich semantics, with potential use in fields like event extraction and machine
translation. Node generation, typically done using a simple dictionary lookup,
is currently an important limiting factor in AMR parsing. We propose a small
set of actions that derive AMR subgraphs by transformations on spans of text,
which allows for more robust learning of this stage. Our set of construction
actions generalize better than the previous approach, and can be learned with a
simple classifier. We improve on the previous state-of-the-art result for AMR
parsing, boosting end-to-end performance by 3 F on both the LDC2013E117 and
LDC2014T12 datasets.Comment: To appear in ACL 201
A large annotated corpus for learning natural language inference
Understanding entailment and contradiction is fundamental to understanding
natural language, and inference about entailment and contradiction is a
valuable testing ground for the development of semantic representations.
However, machine learning research in this area has been dramatically limited
by the lack of large-scale resources. To address this, we introduce the
Stanford Natural Language Inference corpus, a new, freely available collection
of labeled sentence pairs, written by humans doing a novel grounded task based
on image captioning. At 570K pairs, it is two orders of magnitude larger than
all other resources of its type. This increase in scale allows lexicalized
classifiers to outperform some sophisticated existing entailment models, and it
allows a neural network-based model to perform competitively on natural
language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the
conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli
The SNLI Corpus
The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. We also thank our many excellent Mechanical Turk contributors
Relation Extraction with Self-determined Graph Convolutional Network
Relation Extraction is a way of obtaining the semantic relationship between
entities in text. The state-of-the-art methods use linguistic tools to build a
graph for the text in which the entities appear and then a Graph Convolutional
Network (GCN) is employed to encode the pre-built graphs. Although their
performance is promising, the reliance on linguistic tools results in a non
end-to-end process. In this work, we propose a novel model, the Self-determined
Graph Convolutional Network (SGCN), which determines a weighted graph using a
self-attention mechanism, rather using any linguistic tool. Then, the
self-determined graph is encoded using a GCN. We test our model on the TACRED
dataset and achieve the state-of-the-art result. Our experiments show that SGCN
outperforms the traditional GCN, which uses dependency parsing tools to build
the graph.Comment: CIKM-202
Language-Independent Discriminative Parsing of Temporal Expressions
Temporal resolution systems are traditionally tuned to a particular language, requiring significant human effort to translate them to new languages. We present a language independent semantic parser for learning the interpretation of temporal phrases given only a corpus of utterances and the times they reference. We make use of a latent parse that encodes a language-flexible representation of time, and extract rich features over both the parse and associated temporal semantics. The parameters of the model are learned using a weakly supervised bootstrapping approach, without the need for manually tuned parameters or any other language expertise. We achieve state-of-the-art accuracy on all languages in the TempEval-2 temporal normalization task, reporting a 4 % improvement in both English and Spanish accuracy, and to our knowledge the first results for four other languages.
Philosophers are Mortal: Inferring the Truth of Unseen Facts
Large databases of facts are prevalent in many applications. Such databases are accurate, but as they broaden their scope they become increasingly incomplete. In contrast to extending such a database, we present a system to query whether it contains an arbitrary fact. This work can be thought of as re-casting open domain information extraction: rather than growing a database of known facts, we smooth this data into a database in which any possible fact has membership with some confidence. We evaluate our system predicting held out facts, achieving 74.2 % accuracy and outperforming multiple baselines. We also evaluate the system as a commonsense filter for the ReVerb Open IE system, and as a method for answer validation in a Question Answering task
Combining Distant and Partial Supervision for Relation Extraction
Broad-coverage relation extraction either requires expensive supervised training data, or suffers from drawbacks inherent to distant supervision. We present an ap-proach for providing partial supervision to a distantly supervised relation extrac-tor using a small number of carefully se-lected examples. We compare against es-tablished active learning criteria and pro-pose a novel criterion to sample examples which are both uncertain and representa-tive. In this way, we combine the ben-efits of fine-grained supervision for diffi-cult examples with the coverage of a large distantly supervised corpus. Our approach gives a substantial increase of 3.9 % end-to-end F1 on the 2013 KBP Slot Filling evaluation, yielding a net F1 of 37.7%.
The SNLI Corpus
The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. We also thank our many excellent Mechanical Turk contributors